AITopics | Snohomish County

Collaborating Authors

Snohomish County

Unsupervised Evaluation of Multi-Turn Objective-Driven Interactions

Soroka, Emi, Chopra, Tanmay, Desai, Krish, Lall, Sanjay

arXiv.org Artificial IntelligenceNov-6-2025

Large language models (LLMs) have seen increasing popularity in enterprise applications where AI agents and humans engage in objective-driven interactions. However, these systems are difficult to evaluate: data may be complex and unlabeled; human annotation is often impractical at scale; custom metrics can monitor for specific errors, but not previously-undetected ones; and LLM judges can produce unreliable results. We introduce the first set of unsupervised metrics for objective-driven interactions, leveraging statistical properties of unlabeled interaction data and using fine-tuned LLMs to adapt to distributional shifts. We develop metrics for labeling user goals, measuring goal completion, and quantifying LLM uncertainty without grounding evaluations in human-generated ideal responses. Our approach is validated on open-domain and task-specific interaction data.

completion, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2511.03047

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(8 more...)

Genre: Research Report (0.51)

Industry:

Banking & Finance > Insurance (1.00)
Health & Medicine > Health Care Providers & Services (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

7bc4f74e35bcfe8cfe43b0a860786d6a-Supplemental-Conference.pdf

Neural Information Processing SystemsSep-28-2025, 09:03:26 GMT

machine learning, missouri valley conference man, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe > Moldova (1.00)
Asia > Middle East > Israel (0.68)
Atlantic Ocean (0.45)
(20 more...)

Genre: Press Release (0.45)

Industry:

Leisure & Entertainment > Sports > Soccer (1.00)
Leisure & Entertainment > Sports > Hockey (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
(16 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

HistoryBankQA: Multilingual Temporal Question Answering on Historical Events

Mandal, Biswadip, Khandelwal, Anant, Gupta, Manish

arXiv.org Artificial IntelligenceSep-17-2025

Temporal reasoning about historical events is a critical skill for NLP tasks like event extraction, historical entity linking, temporal question answering, timeline summarization, temporal event clustering and temporal natural language inference. Yet efforts on benchmarking temporal reasoning capabilities of large language models (LLMs) are rather limited. Existing temporal reasoning datasets are limited in scale, lack multilingual coverage and focus more on contemporary events. To address these limitations, we present HistoryBank, a multilingual database of 10M+ historical events extracted from Wikipedia timeline pages and article infoboxes. Our database provides unprecedented coverage in both historical depth and linguistic breadth with 10 languages. Additionally, we construct a comprehensive question answering benchmark for temporal reasoning across all languages. This benchmark covers a diverse set of 6 temporal QA reasoning tasks, and we evaluate a suite of popular language models (LLaMA-3-8B, Mistral-7B, Gemma-2-9b, Qwen3-8B, GPT4o) to assess their performance on these tasks. As expected GPT4o performs best across all answer types and languages; Gemma-2 outperforms the other small language models. Our work aims to provide a comprehensive resource for advancing multilingual and temporally-aware natural language understanding of historical events. To facilitate further research, we will make our code and datasets publicly available upon acceptance of this paper.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.1272

Country:

Europe > Russia (0.15)
Asia > Russia (0.15)
Asia > Indonesia (0.14)
(96 more...)

Genre: Research Report (0.81)

Industry:

Leisure & Entertainment > Sports (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Media (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

SPIRA: Building an Intelligent System for Respiratory Insufficiency Detection

Ferreira, Renato Cordeiro, Gomes, Dayanne, Tamae, Vitor, Wernke, Francisco, Goldman, Alfredo

arXiv.org Artificial IntelligenceAug-13-2025

Respiratory insufficiency is a medic symptom in which a person gets a reduced amount of oxygen in the blood. This paper reports the experience of building SPIRA: an intelligent system for detecting respiratory insufficiency from voice. It compiles challenges faced in two succeeding implementations of the same architecture, summarizing lessons learned on data collection, training, and inference for future projects in similar systems.

architecture, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.5753/ise.2022.227048

2507.04548

Country:

South America > Brazil > São Paulo (0.07)
North America > United States > Washington > Snohomish County > Lynnwood (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.48)
Health & Medicine > Therapeutic Area > Immunology (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Anchored Diffusion Language Model

Rout, Litu, Caramanis, Constantine, Shakkottai, Sanjay

arXiv.org Artificial IntelligenceMay-27-2025

Diffusion Language Models (DLMs) promise parallel generation and bidirectional context, yet they underperform autoregressive (AR) models in both likelihood modeling and generated text quality. We identify that this performance gap arises when important tokens (e.g., key words or low-frequency words that anchor a sentence) are masked early in the forward process, limiting contextual information for accurate reconstruction. To address this, we introduce the Anchored Diffusion Language Model (ADLM), a novel two-stage framework that first predicts distributions over important tokens via an anchor network, and then predicts the likelihoods of missing tokens conditioned on the anchored predictions. ADLM significantly improves test perplexity on LM1B and OpenWebText, achieving up to 25.4% gains over prior DLMs, and narrows the gap with strong AR baselines. It also achieves state-of-the-art performance in zero-shot generalization across seven benchmarks and surpasses AR models in MAUVE score, which marks the first time a DLM generates better human-like text than an AR model. Theoretically, we derive an Anchored Negative Evidence Lower Bound (ANELBO) objective and show that anchoring improves sample complexity and likelihood modeling. Beyond diffusion, anchoring boosts performance in AR models and enhances reasoning in math and logic tasks, outperforming existing chain-of-thought approaches

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.18456

Country:

Africa > Middle East > Egypt (0.04)
North America > United States > Oklahoma > Oklahoma County > Oklahoma City (0.04)
Asia > Middle East > Jordan (0.04)
(11 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (1.00)
Leisure & Entertainment > Sports > Basketball (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reliable, Routable, and Reproducible: Collection of Pedestrian Pathways at Statewide Scale

Zhang, Yuxiang, Howe, Bill, Caspi, Anat

arXiv.org Artificial IntelligenceOct-11-2024

While advances in mobility technology including autonomous vehicles and multi-modal navigation systems can improve mobility equity for people with disabilities, these technologies depend crucially on accurate, standardized, and complete pedestrian path networks. Ad hoc collection efforts lead to a data record that is sparse, unreliable, and non-interoperable. This paper presents a sociotechnical methodology to collect, manage, serve, and maintain pedestrian path data at a statewide scale. Combining the automation afforded by computer-vision approaches applied to aerial imagery and existing road network data with the quality control afforded by interactive tools, we aim to produce routable pedestrian pathways for the entire State of Washington within approximately two years. We extract paths, crossings, and curb ramps at scale from aerial imagery, integrating multi-input segmentation methods with road topology data to ensure connected, routable networks. We then organize the predictions into project regions selected for their value to the public interest, where each project region is divided into intersection-scale tasks. These tasks are assigned and tracked through an interactive tool that manages concurrency, progress, feedback, and data management. We demonstrate that our automated systems outperform state-of-the-art methods in producing routable pathway networks, which then significantly reduces the time required for human vetting. Our results demonstrate the feasibility of yielding accurate, robust pedestrian pathway networks at the scale of an entire state. This paper intends to inform procedures for national-scale ADA compliance by providing pedestrian equity, safety, and accessibility, and improving urban environments for all users.

artificial intelligence, machine learning, social media, (20 more...)

arXiv.org Artificial Intelligence

2410.19762

Country:

North America > United States > Washington > Snohomish County (0.04)
North America > United States > Washington > King County (0.04)
North America > United States > Oregon > Multnomah County (0.04)
(3 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Transportation > Infrastructure & Services (0.66)
Transportation > Ground > Road (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.67)
Information Technology > Communications > Social Media > Crowdsourcing (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Urban Mobility Assessment Using LLMs

Bhandari, Prabin, Anastasopoulos, Antonios, Pfoser, Dieter

arXiv.org Artificial IntelligenceAug-22-2024

Understanding urban mobility patterns and analyzing how people move around cities helps improve the overall quality of life and supports the development of more livable, efficient, and sustainable urban areas. A challenging aspect of this work is the collection of mobility data by means of user tracking or travel surveys, given the associated privacy concerns, noncompliance, and high cost. This work proposes an innovative AI-based approach for synthesizing travel surveys by prompting large language models (LLMs), aiming to leverage their vast amount of relevant background knowledge and text generation capabilities. Our study evaluates the effectiveness of this approach across various U.S. metropolitan areas by comparing the results against existing survey data at different granularity levels. These levels include (i) pattern level, which compares aggregated metrics like the average number of locations traveled and travel time, (ii) trip level, which focuses on comparing trips as whole units using transition probabilities, and (iii) activity chain level, which examines the sequence of locations visited by individuals. Our work covers several proprietary and open-source LLMs, revealing that open-source base models like Llama-2, when fine-tuned on even a limited amount of actual data, can generate synthetic data that closely mimics the actual travel survey data, and as such provides an argument for using such data in mobility studies.

activity chain, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2409.00063

Country:

North America > United States > California > San Francisco County > San Francisco (0.15)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
(9 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Transportation (0.93)
Information Technology (0.86)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

The fatal mistake a Tesla driver made before killing 'kind and outgoing' 28-year-old in Washington

Daily Mail - Science & techJul-31-2024, 16:34:16 GMT

Authorities have confirmed that a Tesla on autopilot was partly responsible for a crash in Washington that killed a motorcyclist . Jeffrey Nissen, 28, was traveling about 15 miles northeast of Seattle when a Model S came from behind and rammed him off his bike before running him over. Investigators from the Washington State Patrol found the Tesla driver was operating on the company's'Full Self Driving' (FSD) and had looked at his cell phone while the vehicle was moving. Nissen was found under the car and pronounced dead at the scene, authorities reported. The 56-year-old driver was arrested for investigation of vehicular homicide.

artificial intelligence, tesla driver, vehicle, (12 more...)

Daily Mail - Science & tech

Country:

North America > United States > Washington > King County > Seattle (0.40)
North America > United States > Washington > Snohomish County (0.05)
North America > United States > Colorado (0.05)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.71)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Data-Driven Ergonomic Risk Assessment of Complex Hand-intensive Manufacturing Processes

Krishnan, Anand, Yang, Xingjian, Seth, Utsav, Jeyachandran, Jonathan M., Ahn, Jonathan Y., Gardner, Richard, Pedigo, Samuel F., Adriana, null, Blom-Schieber, null, Banerjee, Ashis G., Manohar, Krithika

arXiv.org Artificial IntelligenceMar-5-2024

Hand-intensive manufacturing processes, such as composite layup and textile draping, require significant human dexterity to accommodate task complexity. These strenuous hand motions often lead to musculoskeletal disorders and rehabilitation surgeries. We develop a data-driven ergonomic risk assessment system with a special focus on hand and finger activity to better identify and address ergonomic issues related to hand-intensive manufacturing processes. The system comprises a multi-modal sensor testbed to collect and synchronize operator upper body pose, hand pose and applied forces; a Biometric Assessment of Complete Hand (BACH) formulation to measure high-fidelity hand and finger risks; and industry-standard risk scores associated with upper body posture, RULA, and hand activity, HAL. Our findings demonstrate that BACH captures injurious activity with a higher granularity in comparison to the existing metrics. Machine learning models are also used to automate RULA and HAL scoring, and generalize well to unseen participants. Our assessment system, therefore, provides ergonomic interpretability of the manufacturing processes studied, and could be used to mitigate risks through minor workplace optimization and posture corrections.

artificial intelligence, machine learning, predicted, (18 more...)

arXiv.org Artificial Intelligence

2403.05591

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > United Kingdom (0.04)
Asia > India (0.04)
(7 more...)

Genre: Research Report > New Finding (0.86)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.61)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Titan submersible recovery efforts continue with help of remotely operated vehicle

FOX NewsJun-26-2023, 01:36:57 GMT

Navy SEAL Jake Zweig responds to the intense search for the missing Titanic submarine on'Fox & Friends.' Efforts to recover the remains of the Titan submersible that suffered a catastrophic implosion near the Titanic wreckage are currently underway, and as of Sunday, had descended to the seafloor for a fourth dive. Last Thursday, the U.S. Coast Guard confirmed that a debris field located about 1,600 feet from the wreckage of the Titanic was in fact that of the missing Titan submersible. The underwater vessel was carrying five men on board when it lost contact with its surface ship about an hour and 45 minutes after descending to the Titanic. South Wellfleet, Massachusetts-based Pelagic Research Services (PRS) was contacted by OceanGate, the company behind Titan, for use of its remotely operated vehicles, or "ROVs," to assist with the search. Pelagic Research Services continues to assist the Transportation Safety Board of Canada, U.S. Coast Guard, and U.S. National Transportation Safety Board with Titan recovery efforts near the Titanic wreckage.

artificial intelligence, pelagic research service, transportation safety board, (12 more...)

FOX News

Country:

North America > Canada (0.39)
North America > United States > Massachusetts (0.26)
North America > United States > Washington > Snohomish County > Everett (0.06)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Coast Guard (1.00)

Technology: Information Technology > Artificial Intelligence > Robots (0.62)

Add feedback